home *** CD-ROM | disk | FTP | other *** search
- Path: keats.ugrad.cs.ubc.ca!not-for-mail
- From: c2a192@ugrad.cs.ubc.ca (Kazimir Kylheku)
- Newsgroups: comp.lang.c,comp.std.c,comp.lang.c++
- Subject: Re: Floating Point arithmetic problem
- Date: 14 Feb 1996 19:27:01 -0800
- Organization: Computer Science, University of B.C., Vancouver, B.C., Canada
- Message-ID: <4fu965INNq3g@keats.ugrad.cs.ubc.ca>
- References: <c968da6jzm.fsf@damayanti.india.ti.com>
- NNTP-Posting-Host: keats.ugrad.cs.ubc.ca
-
- In article <c968da6jzm.fsf@damayanti.india.ti.com>,
- Kuntal Shah <kuntal@india.ti.com> wrote:
- >
- >I am having a wierd problem with floating point arithmetic. Gurus on
- >the net, please bail me out. I am working on the SUN 4.1.x platform.
- >
- >I have a "double" variable say d to which I need to add certain float
- >numbers of moderate magnitude (say less than 10000). This addition
- >occurs in a loop in my program which get executed more than a million
- >times depending on my testcase.
- >
- > { /* loop begin */
- >
- > /* some code */
- >
- > d = d + f; /* f < 10000 */
-
- Ouch. Don't do that. The accumulated truncation error will kill ya.
-
- This can be fixed by using an integer counter which you multiply by a floating
- point scale factor, like this:
-
- #define STEPS 100000
-
- {
- int i; /* loop counter */
- int t; /* parameter */
-
- for (i = 0; i < STEPS; i++) {
- t = (double) i / STEPS;
- /* ... */
- }
- }
-
- Here the loop counter iterates through 100,000 steps, and the floating
- point parameter t varies from 0 to 1. It does not suffer from cumulative
- truncation errors because it is recalculated from the precise integer value
- each time by a multiplication.
-
- >Now coming to the problem. The insignificant digits due to the
- >floating point representation keep accruing and there comes a stage
- >when the accrued value exceeds 0.0001 which results in failure of the
- >if condition in the above block of code, when ideally no such thing
- >should have occurred.
- >
- >All I need is a solution that will overcome this problem. Please bear
- >in mind that the loop is executed millions of times and hence any
- >costly operation within the loop with drastically bring down
- >performance.
- >
- >I have a few options to overcome this problem :-
- >
- >* After each addition, covert 'd' to an unsigned long after
- > multiplying by say 1e8, (thus truncating the unnecessary digits),
- > and divide it by 1e8 to get back the original value.
- >
- >* After every few additions, say 1000, do the above operation.
- >
- >In both the above operations, a severe problem would arise in cases
- >when the value represented is less than the value asked for. For
- >example,
- >
- > f= 213.22 would be represented as 213.2199999999999988631316228
- >
- >and cutting off the last few digits would result in negative accrual
- >in the wrong run. Since I am not sure of the value of f till run time,
- >I cannot solely depend on +ve or -ve accrual to happen.
- >
- >Do you have any solution to this problem? Ideally I would like the
- >following answers :-
-
- Try my above way of using an integer as a parameter. A 32-bit integer has a
- plentiful range for any number of iterations you are likely to attempt and can
- be thunked into floating point quite readily. On most workstations, the double
- type is 64-bit with a 52-bit mantissa can accomodate large 32-bit integers
- without truncation, so you should be OK regardless of what range of integers
- you convert to floating point.
-
- >* Is it possible to use bit wise operators (since they are lot faster
- > than other computations) to remove the least significant bits? I
- > tried doing this but wasn't all that effective.
-
- I strongly discourage you from even contemplating this. Floating point
- representations can vary wildly from architecture to architecture. The bit
- operators are really intended for unsigned integer operands.
-
- >* Is it possible to set to zero, say the last 10-15 digits of the
- > decimal part without any effect in the long run on the 5 digit
- > precision I require?
-
- Not easily. For one thing, the resulting number may not be representable in the
- machine's floating point format. The floating point format is typically binary,
- not decimal, and numbers that have terminating decimal digits in base ten may
- have repeating digits in base two.
-
- >* Is there any function that can round numbers off to the required
- > precision, ie, can I specify 0.66666666666623345 to be rounded off
- > to 0.666666666667 without undergoing the usual multiply, truncate,
- > divide flow.
-
- No.
-
- My advice: buy an undergraduate-level textbook on Numerical Analysis.
- A good book will explain floating point formats, coping with rounding and
- truncation errors and so forth, usually in the first chapter.
- --
-
-